Audio-visual phoneme classification for pronunciation training applications
نویسندگان
چکیده
We present a method for audio-visual classification of Swedish phonemes, to be used in computer-assisted pronunciation training. The probabilistic kernel-based method is applied to the audio signal and/or either a principal or an independent component (PCA or ICA) representation of the mouth region in video images. We investigate which representation (PCA or ICA) that may be most suitable and the number of components required in the base, in order to be able to automatically detect pronunciation errors in Swedish from audio-visual input. Experiments performed on one speaker show that the visual information help avoiding classification errors that would lead to gravely erroneous feedback to the user; that it is better to perform phoneme classification on audio and video separately and then fuse the results, rather than combining them before classification; and that PCA outperforms ICA for fewer than 50 components.
منابع مشابه
Audio-visual classification of Swedish phonemes for pronunciation training
We present a method for audio-visual classification of Swedish phonemes, to be used in computer-assisted pronunciation training. The probabilistic kernel-based method is applied to the audio signal and/or either a principal or an independent component (PCA or ICA) representation of the mouth region in video images. We investigate which representation (PCA or ICA) that may be most suitable and t...
متن کاملA System Demonstration of a Framework for Computer Assisted Pronunciation Training
In this paper, we demonstrate a system implementation of a framework for computer assisted pronunciation training for second language learner (L2). This framework supports an iterative improvement of the automatic pronunciation error recognition and classification by allowing integration of annotated error data. The annotated error data is acquired via an annotation tool for linguists. This pap...
متن کاملA real-time articulatory visual feedback approach with target presentation for second language pronunciation learning.
Articulatory information can support learning or remediating pronunciation of a second language (L2). This paper describes an electromagnetic articulometer-based visual-feedback approach using an articulatory target presented in real-time to facilitate L2 pronunciation learning. This approach trains learners to adjust articulatory positions to match targets for a L2 vowel estimated from product...
متن کاملBased Persian Viseme Clustering
Viseme (Visual Phoneme) clusterin every language is among the most important conducting various multimedia researches as reading, lip synchronization and com pronunciation training applications. With re that clustering and analyzing visemes are lan processes, we concentrated our research on P which indeed has suffered from lack of su paper, we used a hierarchical approach for c in Persian langu...
متن کاملOutline: Applications of Neural Nets Nettalk -learning Pronunciation of English Text Classifying Sonar Targets 16.1 Nettalk 16.1.1 Overview Phoneme String Text Speech Figure 16.1: a Text-to-speech System Using Nettalk
NETtalk is a classic example of a back-propagation trained multi-layer perceptron network applied to a practical application. NETtalk, created by Sejnowski and Rosen-berg 1], applies a multi-layer network to the text-to-speech problem. The goal is to develop a system which can convert English text into its underlying sequence of phonemes and stress markers. The string of phonemes and stress mar...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007